Enter the name of the tissue you want to analyze.
tissue_of_interest = "Liver"
library(here)
source(here("00_data_ingest", "02_tissue_analysis_rmd", "boilerplate.R"))
tiss = load_tissue_facs(tissue_of_interest)
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
[1] "Scaling data matrix"
|
| | 0%
|
|====================================================================================================================| 100%
Calculating gene means
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene variance to mean ratios
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
The loading function above includes basic preprocessing of the gene x cell matrix, including
We can visualize top genes in each principal component.
To further reduce variance in the data, we will project onto the top principal components. This has the effect of keeping the major directions of variation in the data and, ideally, supressing noise. A decent rule of thumb is to pick the elbow in the plot below.
PCElbowPlot(object = tiss)
Choose the number of principal components to use.
# Set number of principal components.
n.pcs = 11
The clustering is performed using on a shared-nearest-neighbors graph on the cells. Two cells are connected in the graph of their k-neighborhoods of cells overlap. The (enhanced) Louvain algorithm looks for groups of cells with high modularity–more connections within the group than between groups. The resolution parameter determines the tradeoff between in-group connections and between-group connections in that objective. Higher resolution will give more clusters, lower resolution will give fewer.
# Set resolution
res.used <- 1
tiss <- FindClusters(object = tiss, reduction.type = "pca", dims.use = 1:n.pcs,
resolution = res.used, print.output = 0, save.SNN = TRUE)
We use tSNE solely to visualize the data.
tiss <- RunTSNE(object = tiss, dims.use = 1:n.pcs, seed.use = 10, perplexity=30)
TSNEPlot(object = tiss, do.label = T, pt.size = 1.2, label.size = 4)
Check expression of genes useful for indicating cell type.
hepatocyte: Alb, Ttr, Apoa1, and Serpina1c endothelial cells: Pecam1, Nrp1, Kdr+ and Oit3+ Kuppfer cells: Emr1, Clec4f, Cd68, Irf7 NK/NKT cells: Zap70, Il2rb, Nkg7, Cxcr6, Klr1c, Gzma B cells: Cd79a, Cd79b, Cd74 and Cd19
genes_hep = c('Alb', 'Ttr', 'Apoa1', 'Serpina1c')
genes_endo = c('Pecam1', 'Nrp1', 'Kdr','Oit3')
genes_kuppfer = c('Emr1', 'Clec4f', 'Cd68', 'Irf7')
genes_nk = c('Zap70', 'Il2rb', 'Nkg7', 'Cxcr6', 'Gzma')
genes_b = c('Cd79a', 'Cd79b', 'Cd74')
genes_all = c(genes_hep, genes_endo, genes_kuppfer, genes_nk, genes_b)
In the tSNE plots below, the intensity of each point represents the gene expression, scaled to mean 0 and variance 1 across all cells.
Dotplots show, for each cluster and gene, the fraction of cells with at least one read for the gene (circle size) and the average scaled expression for that gene among the cells expressing it (circle color).
The low but nonzero levels of Albumin present in all clusters is consistent with a small amount of leakage, either through physical contamination or index hopping. Nevertheless, the absolute levels of expression confirm a sharp difference between the hepatocyte clusters and the others.
To confirm the identity of a cluster, you can inspect the genes differentially expressed in that cluster compared to the others.
clust.markers7 <- FindMarkers(object = tiss, ident.1 = 7, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
| | 0 % ~calculating
|+ | 1 % ~01m 08s
|++ | 2 % ~01m 07s
|++ | 3 % ~01m 05s
|+++ | 4 % ~01m 03s
|+++ | 5 % ~01m 02s
|++++ | 6 % ~01m 01s
|++++ | 7 % ~60s
|+++++ | 8 % ~59s
|+++++ | 9 % ~58s
|++++++ | 10% ~57s
|++++++ | 11% ~56s
|+++++++ | 12% ~55s
|+++++++ | 13% ~54s
|++++++++ | 14% ~54s
|++++++++ | 15% ~53s
|+++++++++ | 16% ~53s
|+++++++++ | 17% ~52s
|++++++++++ | 18% ~52s
|++++++++++ | 19% ~51s
|+++++++++++ | 20% ~51s
|+++++++++++ | 21% ~50s
|++++++++++++ | 22% ~49s
|++++++++++++ | 23% ~49s
|+++++++++++++ | 24% ~48s
|+++++++++++++ | 25% ~47s
|++++++++++++++ | 26% ~47s
|++++++++++++++ | 27% ~47s
|+++++++++++++++ | 28% ~46s
|+++++++++++++++ | 29% ~45s
|++++++++++++++++ | 30% ~44s
|++++++++++++++++ | 31% ~44s
|+++++++++++++++++ | 32% ~43s
|+++++++++++++++++ | 33% ~42s
|++++++++++++++++++ | 34% ~42s
|++++++++++++++++++ | 35% ~42s
|+++++++++++++++++++ | 36% ~41s
|+++++++++++++++++++ | 37% ~40s
|++++++++++++++++++++ | 38% ~40s
|++++++++++++++++++++ | 39% ~39s
|+++++++++++++++++++++ | 40% ~39s
|+++++++++++++++++++++ | 41% ~38s
|++++++++++++++++++++++ | 42% ~38s
|++++++++++++++++++++++ | 43% ~38s
|+++++++++++++++++++++++ | 44% ~37s
|+++++++++++++++++++++++ | 45% ~37s
|++++++++++++++++++++++++ | 46% ~36s
|++++++++++++++++++++++++ | 47% ~35s
|+++++++++++++++++++++++++ | 48% ~35s
|+++++++++++++++++++++++++ | 49% ~34s
|++++++++++++++++++++++++++ | 51% ~33s
|++++++++++++++++++++++++++ | 52% ~33s
|+++++++++++++++++++++++++++ | 53% ~32s
|+++++++++++++++++++++++++++ | 54% ~31s
|++++++++++++++++++++++++++++ | 55% ~31s
|++++++++++++++++++++++++++++ | 56% ~30s
|+++++++++++++++++++++++++++++ | 57% ~29s
|+++++++++++++++++++++++++++++ | 58% ~29s
|++++++++++++++++++++++++++++++ | 59% ~28s
|++++++++++++++++++++++++++++++ | 60% ~27s
|+++++++++++++++++++++++++++++++ | 61% ~26s
|+++++++++++++++++++++++++++++++ | 62% ~26s
|++++++++++++++++++++++++++++++++ | 63% ~25s
|++++++++++++++++++++++++++++++++ | 64% ~24s
|+++++++++++++++++++++++++++++++++ | 65% ~24s
|+++++++++++++++++++++++++++++++++ | 66% ~23s
|++++++++++++++++++++++++++++++++++ | 67% ~22s
|++++++++++++++++++++++++++++++++++ | 68% ~22s
|+++++++++++++++++++++++++++++++++++ | 69% ~21s
|+++++++++++++++++++++++++++++++++++ | 70% ~20s
|++++++++++++++++++++++++++++++++++++ | 71% ~20s
|++++++++++++++++++++++++++++++++++++ | 72% ~19s
|+++++++++++++++++++++++++++++++++++++ | 73% ~18s
|+++++++++++++++++++++++++++++++++++++ | 74% ~18s
|++++++++++++++++++++++++++++++++++++++ | 75% ~17s
|++++++++++++++++++++++++++++++++++++++ | 76% ~16s
|+++++++++++++++++++++++++++++++++++++++ | 77% ~16s
|+++++++++++++++++++++++++++++++++++++++ | 78% ~15s
|++++++++++++++++++++++++++++++++++++++++ | 79% ~14s
|++++++++++++++++++++++++++++++++++++++++ | 80% ~13s
|+++++++++++++++++++++++++++++++++++++++++ | 81% ~13s
|+++++++++++++++++++++++++++++++++++++++++ | 82% ~12s
|++++++++++++++++++++++++++++++++++++++++++ | 83% ~11s
|++++++++++++++++++++++++++++++++++++++++++ | 84% ~11s
|+++++++++++++++++++++++++++++++++++++++++++ | 85% ~10s
|+++++++++++++++++++++++++++++++++++++++++++ | 86% ~09s
|++++++++++++++++++++++++++++++++++++++++++++ | 87% ~09s
|++++++++++++++++++++++++++++++++++++++++++++ | 88% ~08s
|+++++++++++++++++++++++++++++++++++++++++++++ | 89% ~07s
|+++++++++++++++++++++++++++++++++++++++++++++ | 90% ~07s
|++++++++++++++++++++++++++++++++++++++++++++++ | 91% ~06s
|++++++++++++++++++++++++++++++++++++++++++++++ | 92% ~05s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 93% ~05s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 94% ~04s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 95% ~03s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 96% ~03s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 97% ~02s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 98% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 99% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 01m 05s
The top markers for cluster 7 include histocompatibility markers H2-*, consistent with the expression of other B-cell markers seen above.
clust.markers7
Using the markers, we can confidentaly label the clusters:
tiss <- StashIdent(object = tiss, save.name = "cluster.ids")
cluster.ids <- c(0, 1, 2, 3, 4, 5, 6, 7, 8)
free_annotation <- c(
"endothelial cell",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"kuppfer",
"hepatocyte",
"B cell",
"NK/NKT cells")
cell_ontology_class <-c(
"endothelial cell of hepatic sinusoid",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"Kupffer cell",
"hepatocyte",
"B cell",
"natural killer cell")
tiss = stash_annotations(tiss, cluster.ids, free_annotation, cell_ontology_class)
Color by metadata, like plate barcode, to check for batch effects. Here we see that the clusters are segregated by sex, though cell types are mixed within each population.
TSNEPlot(object = tiss, do.return = TRUE, group.by = "mouse.id")
Every cluster contains cell types from multiple mice.
table(FetchData(tiss, c('mouse.id','ident')) %>% droplevels())
ident
mouse.id 0 1 2 3 4 5 6 7 8
3_11_M 160 0 59 29 51 47 0 31 32
3_56_F 0 46 0 11 0 0 25 0 0
3_57_F 0 46 0 5 0 0 28 0 0
3_9_M 22 1 32 37 20 14 1 10 7
Color by cell ontology class on the original tSNE.
TSNEPlot(object = tiss, group.by = "cell_ontology_class")
filename = here('00_data_ingest', '04_tissue_robj_generated',
paste0("facs_", tissue_of_interest, "_seurat_tiss.Robj"))
print(filename)
[1] "/Users/josh/src/tabula-muris/00_data_ingest/04_tissue_robj_generated/facs_Liver_seurat_tiss.Robj"
save(tiss, file=filename)
# To reload a saved object
#filename = here('00_data_ingest', '04_tissue_robj_generated',
# paste0("facs_", tissue_of_interest, "_seurat_subtiss.Robj"))
#load(file=filename)
save_annotation_csv(tiss, tissue_of_interest, "facs")